Goto

Collaborating Authors

 different random seed


Appendices

Neural Information Processing Systems

Appendix A provides derivations supporting Section 3 in the main paper. In this section we provide detailed derivations of the ST -DGMRF joint distribution, for both first-order transition models (Section A.1) and higher-order transition models (Section A.2). A.1 Joint distribution The LDS (see Section 2.2 and 3.1 in the main paper) defines a joint distribution over system states First, note that Eq. (1) can be written as a set of linear equations x We make use of this property in the DGMRF formulation and in the conjugate gradient method. Eq. 11 is converted into a discrete-time dynamical system by approximating ρ We consider two ST -DGMRF variants that capture different amounts of prior knowledge. DGMRF transition matrices can be parameterized accordingly. The air quality dataset is based on hourly PM2.5 measurements obtained from [ The raw PM2.5 measurements are log-transformed and standardized to zero mean and unit Ca. 50% of the nodes are masked out (purple nodes within We use a simple MLP with one hidden layer of width 16 with ReLU activations and no output non-linearity. The DGMRF parameters are not shared across time, allowing for dynamically changing spatial covariance patterns.


Supplementary Material

Neural Information Processing Systems

The supplementary material is organized as follows. We give details of the definitions and notation in Section B.1 . Then, we provide the technical details of the lower bound (Lemma 3.3). In Section D.4 we provide insights into auto-labeling using This suggests, in these settings auto-labeling using active learning followed by selective classification is expected to work well. This idea is captured by the Chow's excess risk [ Nevertheless, it would be interesting future work to explore the connections between auto-labeling and active learning with abstention.




Appendix

Neural Information Processing Systems

I{ } is the indicator function. It's sufficient to prove that the denominator converges to that of softmax at each point We have shown that softmax is translational invariant w.r.t. Without the loss of generality, we use τ = 1 in the following proof. To begin with, we prove the first equation and then give the proof of the second part of Theorem 3.3. We introduce some extra notations that are used throughout the proof.